
STOP 



Early Journal Content on JSTOR, Free to Anyone in the World 

This article is one of nearly 500,000 scholarly works digitized and made freely available to everyone in 
the world byJSTOR. 

Known as the Early Journal Content, this set of works include research articles, news, letters, and other 
writings published in more than 200 of the oldest leading academic journals. The works date from the 
mid-seventeenth to the early twentieth centuries. 

We encourage people to read and share the Early Journal Content openly and to tell others that this 
resource exists. People may post this content online or redistribute in any way for non-commercial 
purposes. 

Read more about Early Journal Content at http://about.istor.org/participate-istor/individuals/early- 
journal-content . 



JSTOR is a digital library of academic journals, books, and primary source objects. JSTOR helps people 
discover, use, and build upon a wide range of content through a powerful research and teaching 
platform, and preserves this content for future generations. JSTOR is part of ITHAKA, a not-for-profit 
organization that also includes Ithaka S+R and Portico. For more information about JSTOR, please 
contact support@jstor.org. 



ON THE THEORY OF CORRELATION WITH SPECIAL REFERENCE 
TO CERTAIN SIGNIFICANT LOCI ON THE PLANE OF DIS- 
TRIBUTION IN THE CASE OF NORMAL 
CORRELATION. 

Bt H. L. Rietz. 

1. Introduction. — The notion of correlation is of such importance in 
science that it seenas it should become almost as familiar to the scientist as 
the notions of a mathematical function and of independence in the prob- 
abiUty sense. The purposes of the present paper are (1) to present the 
elements of a theory of correlation from assumptions that are suggested 
by applications and that seem to appeal to the mathematical student 
beginning the study of statistics, (2) to give a few properties of normally 
correlated statistical data by means of curves or contour Unes on what I 
call the plane of distribution. 

Take X and Y to represent associated classes of individuals that have 
definite values or with attributes that have definite values. These classes 
may represent any one of a great variety of concrete situations. To illus- 
trate, X may represent rainfalls at a given place in months of April and Y 
those in months of June; X and Y may refer to fathers and sons, when the 
inheritance of some character is in question; X and Y may represent 
statures of husbands and of their wives; X may represent hours per day 
worked by laborer, and Y the corresponding wages paid. These illustra- 
tions very natxirally suggest the following questions that show the purpose 
of a theory of correlation: Is there a measurable tendency for wet Aprils to 
be followed by dry Junes? To what extent do a class of men, in general, 
resemble their father with respect to some character, say stature? Do tall 
men, in general, marrj' tall wives? Do high wages go with short hours of 
labor? 

The problem that we set is to describe, by some summary method, the 
tendency of corresponding individuals of X and Y to vary in the same or in 
opposite directions, when the variations are to be attributed to an indefinite 
number of unassignable causes. It is well known that the relationship of 
corresponding individuals of X and Y, in the illustrations cited above, is 
not such a perfect dependence as is given by a mathematical function. 
When a value is assigned to X, the corresponding values of Y have a certain 
amount of freedom. They may, however, be far from free in the prob- 
ability sense of freedom or independence. 

187 



188 



H. L. EIETZ. 



A relation expressed by a mathematical function and independence in 
the probabiUty sense may be regarded as two extremes between which 
there exists a large region for a theory of correlation. 

2. Table of Double Classification. — To be precise, let 

Xi, Xo, X3, ■ • ■ ) Xn 

be values of the individuals of a random sample of n drawn from the class X. 
In the language of statistics, we call these x's variates. Let y's represent the 
corresponding variates of the class Y. That is to say, X and Y are associ- 
ated so that 

(xi, 2/i), (X2, 2/2), (X3, 2/3), • • •, (x„, Vn) 

are pairs of variates in correspondence. 

The first step in the description of the dependence of the two attributes 
is the construction of a table* of double classification of the following form: 



II Y. 


X. 


X, 








X. 1 









Xi 






1 * 







' 1 i 






F„ 


N,„ 


N„a 


A'3„ 








N,„ 








Ni„ 

1 


Nym 








.... 1 










Y, 


JA^, 


Nu 


Nz, 








A%, 1 








Nu i 


Nyt 




















1 




















;i 


Fa 
Yi 


I'm, 


A'22 
A 21 


Nzz 
A^as 
A-3, 








A'.s 








Niz 
A> 
Ail 


NyZ 








A- ., 








^^,, 








N.i ': 








A^, 
















i A',, 


Nr, 


NrZ 




1 


N.. 1 ! 1 




A^..- 


n 



Fig. 1. 

The symbols Xi, X2, ■ • • in the top row of the table mark subclasses 
covering, in general, equal intervals on the range that includes the entire 
class X. Similarly, Fi, Yi, in the column on the left mark subclasses that 
are taken to include, in general, equal intervals of the range that includes 
the entire class Y. The number of such subclasses may be two or more. 

In this table (Fig. 1), any number, say N,t (s + x, s 4= 2/), indicates the 
number that belongs to both subclasses X, and Yt. The vertical column of 
frequencies N,i, N,2, N,i, ■■, Nh, ■ ■ ■ corresponding to any mark X, is called 
an X-array of y's. Similarly, any row of frequencies Nu, Nn, Nat, :• •, 
Nit, • • ■ corresponding to any mark Yt is a jz-array of x's. 

The numbers in the lower row give the sums of numbers N in columns, 
and show the frequency distribution of the total sample of n into subclasses 
Xi, X2, • • • . Likewise, the column of totals on the right is the frequency 
distribution into subclasses Fi, F2, • • • . 



* Such a table used to study the correlation of statistical data seems to have been employed 
first by Francis Galton, Proc. Royal Society, XL, p. 68. 



THE THEORY OF CORRELATION. 189 

3. Test of Independence. — If the two classes X and Y are independent 
in the probability sense, the best estimate (from our sample of n) that can 
be given, from the separate classes, of the probability that a pair (xp, t/p) 
taken at random is such that Xp belongs to the subclass X, and yp to Yt, is 
the product 



n- 



(1) 



But, from the table (Fig. 1), the best estimate that a pair belongs to 
both these classes is Nstl'n. If the deviations 

5=i^'_^l£il^', (2) 

when extended to all compartments of the table, are greater than may be 
attributed to fluctuations in drawing random samples, there is lack of inde- 
pendence. If the differences (2) are to be attributed to smallness of the 
sample, we say the difference is insignificant. A test as to whether the 
deviations 8 are to be regarded as significant has been given by Sheppard.* 
If the classes X and Y are not independent, and are not absolutely 
dependent in the sense that a mathematical function determines the one 
when the other is given; then, we may say, in a general way, that a theory 
of correlation is required to describe the association of the two classes X 
and 7. 

4. General Description. — In many applications, where the correlation 
is considerable, a certain amount of progress in treating the association of 
X and Y can be made by a consideration of the arrays of the double entry 
table of values without the use of special mathematical methods. That is 
to say, we may treat each array as a frequency distribution, and find some 
kind of average values for the arrays and for the variability from this 
average. Such a treatment is, however, obviously inadequate for many 
purposes. 

Suppose we erect at the center of each rectangle (Fig. 1) of the table a 
perpendicular to the plane of the rectangle proportional in length to the 
number in that rectangle. The ends of these perpendiculars suggest a 
surface to describe the distribution. 

5. Geometrical Description. — To follow the suggestion just mentioned, 
it is convenient to use {xi, t/i), (xa, 2/2), • • •, (x„, y„) as deviations from the 
means of variates of the respective classes X and Y rather than for the 
actual values of the variates themselves. They will be used in this sense 
throughout the remainder of the paper. 

* Philosophical Transactions of the Royal Society, vol. 192A, 1899, p. 128. 



190 



H. L. EIETZ. 



Represent these deviations (xi, j/i), {Xi, y^j, • ■ •, (x„, ?/„) from the mean 
values as coordinates of points in a plane (Fig. 2*). I call this plane the 
plane of distribution. It is the purpose of a theory of correlation to char- 
acterize the arrangement of such points without special reference to indi- 
vidual points. Divide the range along the x-axis that would include all 
points of the class into equal intervals Ax. Likewise, divide the range along 
the F-axis into equal intervals Ay. The points included by two parallels 
to the F-axis (AB) may be said to constitute an x-array of Y's. Similarly, 
the points included by parallels (A'B') are said to be a F-array of x's. 




It is reasonable to assume that some function /(x) exists such that 

/(x)Ax (3) 

is the probability that a variate taken at random from class X gives a point 
in an array marked by a Ax, and that some function <p(y) is such that 

<p{y)^y (4) 

is the probability that a variate taken at random from F gives a point in an 
array marked Ay. The table of double classification (Fig. 1) with per- 
pendiculars to the plane at the centers of rectangles suggested the idea of 
a surface. To follow this suggestion, we consider an area AxAy at the 
intersection of any two arrays marked Ax and Ay. It may be assumed that 



* This distribution of points is not made up in an entirely artificial manner, but represents 
approximately the distribution of a class of husbands and wives with respect to stature. See 
Pearson and Lee, Biometrika, vol. 2, p. 408. 



THE THEORY OF CORRELATION. 191 

a function z = \l/{x, y) exists such that 

^{x, y)AxAy (5) 

gives the probability that a pair (Xp, t/p) taken at random gives a point in 
Ax Ay. 

We may obviously add the conditions 

r f{x)dx = 1, r 'P{y)dy = 1, r* r '^ ^{x,y)dxdy = 1, (6) 

if the functions in (3), (4), and (5) are integrable from — oo to + oo. 

To render more precise and useful the idea of independence discussed in 
§ 3, we may now use the notation of § 5, and define that the two classes X 
and Y are independent if 

Kx,y) =Kx),p{y) (7) 

is an identity. If this equality is not an identity, the two classes are cor- 
related. From this general negative definition of correlaton, it seems diffi- 
cult to determine the character of \p{x, y) so as to give it a practical value. 
We seek therefore an affirmative definition at some loss of generality. To 
obtain a valuable affirmative definition, we fix our attention on the mean 
values of those variates of one class, say of Y, that are in an array marked 
Ax. Or, we may regard the points (Fig. 2) as particles of equal mass and 
fix our attention on the centroid of these particles. With the limiting 
case of an indefinitely large number of variates and small values of Ax, 
it seems reasonable, since mean values have variations of a higher order of 
smallness than individuals, that these centroids, in general, arrange them- 
selves along a smooth curve. That is to say, there may be a correspondence, 
given by a mathematical function, between any assigned X and the centroid 
of corresponding points of Y. 
With reference to the surface 

z = ^{x, y), 

this means that the y coordinate, y, of the centroid of any section by assign- 
ing X is a function of x. Suppose that 

I vKx, y)dy 

y = —;; = e(x) (8) 

J fix, y)dy 

for any assigned x. 

As an affirmative, though special definition of correlation, we define that 
the class Y is correlated with X if d{x) is different from zero. As we have 



192 H. L. EIETZ. 

selected the origin so that 6{x) cannot be a constant different from zero, 
the simplest case of correlation we have to consider is that for which 

eix) = mx + b, (9) 

a linear function of x. 

It affords a vivid description to interpret 6(x) on the plane of distribution 
(Fig. 2). It is sometimes called the curve of regression of y's on .r's. More- 
over, when 6{x) is linear, it is said that there is linear regression. Obviously, 
another curve of regression of x's on y's exists. 

To obtain the to and b to apply to any numerical case, we make use of 
our sample of n. If the hne (9) is subjected to the least squares condition 
that m and b are to be determined so that the sum of the squares of its 
deviations (measured parallel to the y-axis) from the means of arrays 
(weighted with the number in the arrays) is to be a minimum, we obtain 

y = r^x* (10) 

ffx 

where ax- is the mean square of Xi, X2, • • • , Xn, 
a^ is the mean square of 2/1, 2/2, • • • , 2/n, 
and r is defined by the formula 

r = • 

The o-'s are called the "standard deviations" of the systems of variates 
and r is called the "correlation coefficient." 

The hne (10) is the line of distribution of the means of j/'s that corre- 
spond to assigned a;'s. By analogy, 

X = r~y (11) 

is the hne of regression of x's on y's. From our special definition, there is 
no correlation if r = 0. 

7. Standard Deviation of Arrays. — The mean square of the deviations 
of all points (ji, 2/1), (xj, 2/2), • • •, (a;„, 2/n), [Fig. 2], from corresponding points 



on the hne 
is given by 



2/ = r — X 



* See Fig. 2. Line CD Ss the line of regression of wives with respect to husbands, and EF 
is the line of regression of husbands with respect to wives. 



THE THEORY OF CORRELATION. 193 



2^ \ 2/« " ** ~ ^« ) 9=n „ 2 9 _ „ 2.- ^aJ/* „2_ 2 Zrf 



X„^ 



= <jy^ - 2rW + rV„2 

= <r/(l-r^). 
Suppose that the regression is linear, so that the centroid of the x-arrays 
of y's may be taken on the Une 

y = r — x, 

<Tx 

and that the standard deviations of x-arrays of y's are equal. Then 
(T„-{1 — r^) becomes the square of the standard deviation of each x-array 
of t/'s. When the standard deviations of these arrays are unequal, 
<Ty\\ — r^) is merely a sort of an average value of the square of standard 
deviations of arrays. 

Similarly, (ix\\ — r^) is the square of the standard deviation of a 
i/-array of x's. 

8. The Normal Correlation Surface. — The normal correlation surface 
dates back to a memoir by Bravais* in 1846. The importance of this surface 
in the mathematics of statistics was first recognized by Galtonf and has 
been fully demonstrated by the work of Pearson,J Edgeworth, and Yule. 

The equation of the normal surface may be written in the form 

2 _ _____ g~ 2(I-r2) [7?* V ~ "^ / . (13) 

2ir<7x(TyVl — r^ 

For certain associated classes X and 7, this is the form of ^{x, 2/)[§ 5, (5)]. 
When ^p{x, y) takes this form, the correlation is said to be normal. 

It is easy to give many properties of this surface from certain sets of 
which equation (13) may be derived. One set of conditions that character- 
ize the surface and that suggest themselves very naturally in attempting a 
description of certain double entry tables of classification may be stated 
as follows: 

(a) The regression of at least one set of variates on the other is Unear. 
AppUed to the surface, this condition means that 

e{x)=r^x. [§6,(8)] 



* Sur les Probabilit^s des Erreurs de Situati6n d'un Point. Memoires par divers Savants a 
I'Acad^mie des Sciences de France, t. IX (1846), pp. 255-332. 

t Galton, Proc. Roy. Soc, vol. XL, p. 42 (1886). 

t Pearson, PhU. Trans. (A), vol. 187, p. 253 (1896); A, vol. 200, p. 1 (1903). Edgeworth, 
Phil. Mag., 1892, vol. 34, p. 190 (1903). Yule, Journal of the Royal Statistical Society, vol. 60, p. 
812. Cf. Bertrand, Calcul de Probabilitds, Chap. IX, Czuber, Theorie der Beobachtungsfehler, 
Dritter Teil. 



194 H. L. RIETZ. 

(6) The arrays of the correlation table are normal distributions; that 
is, any section of the surface z = \p{x, y) by assigning x or ?/ is a normal* 
(Gaussian) curve. 

These conditions, if we make an additional assumption of a purely 
analytic nature, lead so readily to (13) that we shall not give the details of 
the argument. 

Condition (6) implies that ^{x, y) is of the form e^'^-'", and the addi- 
tional assumption is that X{x, y) can be expanded in a convergent power 
series in x and y. 

Certain Loci on the Plane of Distribution in Normal Correlation. 

9. Ellipses of Equal Probability. — In what follows, it seems to me to add 
somewhat to the description to regard points {xx, t/O, {xi, yi), • • ■, (Xn, y„) 
(Figs. 2 and 3) as particles of equal mass on the plane of distribution. The 
average density of distribution of particles on an area may be defined as the 
relative frequency! of particles on that area divided bj'' the area. From this 
definition, we may pass to the limiting case and say that z in the surface 

z = v^(x, y) 

gives the limiting value of the density at any point on the plane of distribu- 
tion. The curve along which the density of distribution is constant is the 
ellipse obtained by assigning a given value to z in equation (13), and 
interpreting the result on the plane of distribution. The infinite system 
of homothetic ellipses obtained by assigning different values to z plays 
an important role in Bravais's fundamental memoir.J Such an eUipse is 
sometimes called an ellipse of equal probability. § We shall deal in this 
paper (§ 10) with one of these ellipses of special interest. For this purpose, 
the equation of any ellipse of the system may be written in the form 

^ y^ 2rxy_^^,^ (18) 

<Tx Oy Ox<!y 

The area of this ellipse is 
and the semiaxes are given by a = k\ (19) and 6 = k'X (20), where k and 



* The normal curve in rectangular coordinates is defined by the equation 

y ss fiax +bx+c 

where a is negative. It is easily shown that a = — 1 / 2a-^, where <r' is defined as the second mo- 
ment of the area under the ciu've, about the line x = — b/ a, divided by the area. 

t The " relative frequency " of events and the probabihty of an event are used interchange- 
ably in this paper. 

J Loc. cit. 

§ Cf . Bertrand, loc. cit. 



THE THEORY OF CORRELATION. 195 

k' are functions of cxx, <ry, and r. The probability that a particle will fall 
within any ellipse obtained by assigning X, is given by 

^ ^""^f " ,, rV ''^^"\d\ = I - e~ ^^ . (21) 

2ir(Tx(Xy{l — r') Jo 

10. The Ellipse of Maximum Probability. — ^We shall now determine the 
ellipse along which, for a given small ring AX, we should expect more particles 
than along any other ellipse of the system. 

The perimeter of the ellipse of semiaxes k\ and k'\ (§9) is given by 



ik\ f' Vl.-e'' sin' ,pd<p, 

Jo 



where e- is independent of X. Since the integral is independent of X, we 
may write the perimeter of the ellipse in the form k'% where k" is inde- 
pendent of X. Hence, the total probability that a particle falls in a small 
ring between elUpses X and X + AX is given by 



k"M ^('-"-''AX. (22) 

This expression is a maximum when X^ = 1 — r^. Hence, what may well 
be called the ellipse of maximum probability is 

4 + 4_2!:^ = l_,3. (23) 

To illustrate the meaning of this ellipse, in Bertrand's illustration of 
shooting a thousand shots at a target, the probability is greater that a 
shot will fall along this ellipse than along any other ellipse of the infinite 
system. 

We may further easily prove the following theorem: The ellipse of 
maximum probability is identical to the orthogonal projection of parabolic 
points of the correlation surface on the plane of distribution. 

To prove this theorem, we simply find the locus of parabolic points 
on the surface (13) by means of the well-known condition 






dx^ dy 
This gives 

x'^ y^ _ 2rxy _ , _ 2 

which establishes the theorem. 

Attention has often been called to another ellipse known as the "prob- 
able" ellipse. The probable elUpse is defined as that ellipse of the system 



196 



H. L. EIETZ. 



such that the probability is J^ that a particle falls within it. This means, 

by (21), that X is such that 



A« 



or 



p 2(l-r2)_l 



X2 = 1.3863(1 - r2). 



(24) 

Hence, the probable ellipse is larger than the eUipse of maximum prob- 
ability. In fact, the probability that a particle falls within the ellipse of 
maximum probability is 1 — e~* = 0.3935, while that of falUng within the 
probable elUpse is, by definition, Yi. 

For the illustration of statures of husbands and wives, these two ellipses 
are shown in Fig. 3. By actual count from the drawing (Fig. 3), it appears 



—X 




that 536 of the 1,078 points are within* the probable ellipse and 412 are 
within the ellipse of maximum probabihty. These numbers differ from 
the theoretical values by amounts well within what should be expected 
with 1,078 points in all. 

11. Separation of the Plane of Distribution by Lines of Regression 
(Fig. 2) . — The lines of regression and some other lines to be defined presently 
are of such importance in describing the distribution of particles on our 
plane of distribution that we shall consider the probability that a particle 
falls into a given compartment of the plane separated from the rest of the 
plane by these lines. 

Let us take the lines 

y = kx, (25) 

y = kx, (26) 

* One half the points on the ellipse axe counted within it in making this count. 



THE THEORY OF CORRELATION. 197 

and determine the probability that a particle falls into a compartment made 
by these lines. The probability is given by the volume under the correlation 
surface bounded by (25) and (26), interpreted as planes. The probability 
is, if we make 

X = p cos 6, 

y = psin 0, 

1 /»arc tan l^ n" p2 /cos2# sin '» _ 2r 9in 9 co» 9 \ 

1 / , <^xh — rffy , <Txli — r<Ty\ __. 
= s- 1 arc tan — — arc tan 1. (27) 

If (25) and (26) are lines of regression so that 

then 

P = 2~^T^c cos r (0 ^ arc cosr ^ x). (28) 

When r is positive, this is the probability that a point will fall into one 
of the compartments of the smaller angles between the lines of regression. 

To find the probability that a particle falls into this same region under 
independence of X and Y, we make r = in (27) before making the sub- 
stitutions 

<Tx r <7x 

This gives 

F = FT- arc cos 



27r 1 +r^- 

Hence, the excess relative frequency in this compartment of the table is 



— I arc cos r — arc cos .. . ^ I. 



2t 

12. Loci Along Which the Frequency of Particles Bears a Simple 
Relation to the Frequency under Independence. — If the equahty [§ 6, 
(7)] t/'(x, y) = f{x)-<p{y) is not an identity, it may be interpreted as the 
curve in the plane of distribution along which particles are distributed with 
the same frequency as they would be under independence. This ciurve 
and its separation of the plane of distribution have been treated by Pear- 
son* for the case of normal correlation. 



* Drapers' Company Research Memoirs, Biometric Series, I, XIII. 



198 H. L. RIETZ. 

That treatment is easily extended for normal correlation to find the 
curve along which points are k times as frequent as on the hypothesis of 
independence. I call k the intensity of distribution with respect to inde- 
pendence. In the general case, this curve has the equation 

rPix, y) = kf{x) <piy). (30) 

For normal correlation, (30) takes the form 

which may be easily simplified to 

This hyperbola divides the plane of distribution into two regions in one 
of which the intensity of distribution is greater than k, while in the other 
it is less than k. By giving different values to k, we obtain an infinite 
system of hjrperbolas. The case of 

1 



k^ = 



1 - r^ 



is of special interest as it may be regarded as the intensity of distribution 
at the centroid of particles. Making 

k= ' 



Vl - r^ 
in (31) causes the equation to degenerate into the two straight lines 

y^l^xil^Vr^^^. (32) 

These lines are shown as fines AB and CD on Fig. 3. The hyperbola 
along which the frequency is the same as under independence is also shown 
on the same figure. The probability that a particle will fall into a specified 
one of the four compartments [made by lines (32)] of the plane of distribution 
is given by substitution for k and k in (27). This gives for the probability 
3^. That is, the probability is just the same that a particle belongs to one 
of the four compartments as to any other. 

The probabiUty that a point belongs to the region of one of these com- 
partments, in case of independence, is 

1 
2" arc cos r. 



THE THEORY OF CORRELATION. 199 

13. Separation of Plane by What Would Be Lines of Regression Under 
Independence. — These lines are simply our coordinate axes. The prob- 
ability that a particle falls into a specified quadrant under independence 
is, of course, 34- The probability that a particle will fall into the first 
quadrant, in the case of normal correlation, is given by (27) as 



, 1 . / TT . ir\ 

+ 2^ arc sm r I — ^ < arc sm r < „ )• 



1 , 1 
4 



And the probability of falling into the fourth quadrant is 
1 



. n~ arc sm r 

4 27r 



(IT . x\ 

— ^ < arc sin r < ^ 1 • 



Hence, we may say, in the case of normal correlation, if we know only 
in regard to a variate of class X that it is above the mean, that the odds 
are }4 + l/27r arc sin r to }4 ~ 1/2^ arc sin r that the corresponding variate 
of class Y is above the mean. 

In our example of the correlation of husbands and wives in stature, 
the numerical values of the odds are 0.2962 to 0.2038 or 3 to 2 approximately 
that the stature of the wife is above the mean if it is given that the hus- 
band is above the mean of husbands in stature. 

The study of the separations of the plane of distribution by such lines 
as those here considered throws considerable Ught on the character of 
normally correlated statistical data. 

Univeesitt op Illinois. 



